Skeleton Sequence and RGB Frame Based Multi-Modality Feature Fusion Network for Action Recognition

نویسندگان

چکیده

Action recognition has been a heated topic in computer vision for its wide application systems. Previous approaches achieve improvement by fusing the modalities of skeleton sequence and RGB video. However, such methods pose dilemma between accuracy efficiency high complexity video network. To solve problem, we propose multi-modality feature fusion network to combine frame instead video, as key information contained combination is close that In this way, complementary retained while reduced large margin. better explore correspondence two modalities, two-stage framework introduced early stage, introduce attention module projects on single help focus limb movement regions. late cross-attention fuse exploiting correlation. Experiments benchmarks, NTU RGB+D SYSU, show proposed model achieves competitive performance compared with state-of-the-art reducing

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

tight frame approximation for multi-frames and super-frames

در این پایان نامه یک مولد برای چند قاب یا ابر قاب تولید شده تحت عمل نمایش یکانی تصویر برای گروه های شمارش پذیر گسسته بررسی خواهد شد. مثال هایی از این قاب ها چند قاب های گابور، ابرقاب های گابور و قاب هایی برای زیرفضاهای انتقال پایاست. نشان می دهیم که مولد چند قاب تنک نرمال شده (ابرقاب) یکتا وجود دارد به طوری که مینیمم فاصله را از ان دارد. همچنین مسایل مشابه برای قاب های دوگان مطرح شده و برخی ...

15 صفحه اول

Neural Network Based Recognition System Integrating Feature Extraction and Classification for English Handwritten

Handwriting recognition has been one of the active and challenging research areas in the field of image processing and pattern recognition. It has numerous applications that includes, reading aid for blind, bank cheques and conversion of any hand written document into structural text form. Neural Network (NN) with its inherent learning ability offers promising solutions for handwritten characte...

متن کامل

Multi-frame Super Resolution for Improving Vehicle Licence Plate Recognition

License plate recognition (LPR) by digital image processing, which is widely used in traffic monitor and control, is one of the most important goals in Intelligent Transportation System (ITS). In real ITS, the resolution of input images are not very high since technology challenges and cost of high resolution cameras. However, when the license plate image is taken at low resolution, the license...

متن کامل

Human Action Recognition Via Multi-modality Information

In this paper, we propose pyramid appearance and global structure action descriptors on both RGB and depth motion history images and a model-free method for human action recognition. In proposed algorithm, we firstly construct motion history image for both RGB and depth channels, at the same time, depth information is employed to filter RGB information, after that, different action descriptors ...

متن کامل

A SURF-Based Spatio-Temporal Feature for Feature-Fusion-Based Action Recognition

In this paper, we propose a novel spatio-temporal feature which is useful for feature-fusion-based action recognition with Multiple Kernel Learning (MKL). The proposed spatio-temporal feature is based on moving SURF interest points grouped by Delaunay triangulation and on their motion over time. Since this local spatio-temporal feature has different characteristics from holistic appearance feat...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Multimedia Computing, Communications, and Applications

سال: 2022

ISSN: ['1551-6857', '1551-6865']

DOI: https://doi.org/10.1145/3491228